Skip to content

Conversation

XanClic
Copy link

@XanClic XanClic commented Aug 29, 2025

Summary of the PR

This PR contains fixes for fragmented guest memory, i.e. situations where a consecutive guest memory slice does not translate into a consecutive slice in our userspace address space. Currently, that is not really an issue, but with virtual memory (where such discontinuities can occur on any page boundary), it will be.

(See also PR #327).

Specifically:

  • Add GuestMemory::get_slices(), which returns an iterator over slices instead of just a single one
  • Fix Bytes::read() and Bytes::write() to correctly work for fragmented memory (i.e. multiple try_access() closure calls)
  • Have Bytes::load() and Bytes::store() use try_access() instead of to_region_addr(), so they can at least detect if there is fragmentation, and return an error. (Their address argument being properly (naturally) aligned should prevent any problems with fragmentation.)

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
    • Note that this was not possible for patches 2 and 3, as explained in their respective commit messages.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

bonzini
bonzini previously approved these changes Aug 29, 2025
With virtual memory, seemingly consecutive I/O virtual memory regions
may actually be fragmented across multiple pages in our userspace
mapping.  Existing `descriptor_utils::Reader::new()` (and `Writer`)
implementations (e.g. in virtiofsd or vm-virtio/virtio-queue) use
`GuestMemory::get_slice()` to turn guest memory address ranges into
valid slices in our address space; but with this fragmentation, it is
easily possible that a range no longer corresponds to a single slice.

To fix this, add a `get_slices()` method that iterates over potentially
multiple slices instead of a single one.  We should probably also
deprecate `get_slice()`, but I’m hesitant to do it in the same
commit/PR.

(We could also try to use `try_access()` as an existing internal
iterator instead of this new external iterator, which would require
adding lifetimes to `try_access()` so the region and thus slices derived
from it could be moved outside of the closure.  However, that will not
work for virtual memory that we are going to introduce later: It will
have a dirty bitmap that is independent of the one in guest memory
regions, so its `try_access()` function will need to dirty it after the
access.  Therefore, the access must happen in that closure and the
reference to the region must not be moved outside.)

Signed-off-by: Hanna Czenczek <[email protected]>
read() and write() must not ignore the `count` parameter: The mappings
passed into the `try_access()` closure are only valid for up to `count`
bytes, not more.

(Note: We cannot really have a test case for this, as right now, memory
fragmentation will only happen exactly at memory region boundaries.  In
this case, `region.write()`/`region.read()` will only access the region
up until its end, even if the passed slice is longer, and so silently
ignore the length mismatch.  This change is necessary for when page
boundaries result in different mappings within a single region, i.e. the
region does not end at the fragmentation point, and calling
`region.write()`/`region.read()` would write/read across the boundary.
Because we don’t have IOMMU support yet, this can’t be tested.)

Signed-off-by: Hanna Czenczek <[email protected]>
When we switch to a (potentially) virtual memory model, we want to
compact the interface, especially removing references to memory regions
because virtual memory is not just split into regions, but pages first.

The one memory-region-referencing part we are going to keep is
`try_access()` because that method is nicely structured around the
fragmentation we will have to accept when it comes to paged memory.

`to_region_addr()` in contrast does not even take a length argument, so
for virtual memory, using the returned region and address is unsafe if
doing so crosses page boundaries.

Therefore, switch `Bytes::load()` and `store()` from using
`to_region_addr()` to `try_access()`.

(Note: We cannot really have a test case for this, as right now, memory
fragmentation will only happen exactly at memory region boundaries.  In
this case, `region.load()` and `region.store()` would have already
returned errors.  This change is necessary for when page boundaries
result in different mappings within a single region, but because we
don’t have IOMMU support yet, this can’t be tested.)

Signed-off-by: Hanna Czenczek <[email protected]>
@XanClic
Copy link
Author

XanClic commented Aug 29, 2025

Sorry, messed up the safety formatting: Replaced // Safe: by // SAFETY:\n to make clippy happy.

@bonzini
Copy link
Member

bonzini commented Aug 29, 2025

@XanClic Oops, some missing safety comments

Copy link
Member

@roypat roypat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we can keep .get_slice() as a sort of utility function for when someone wants to get a contiguous slice, and where receiving something cross-region boundary would be an error condition.

/// The iterator’s items are wrapped in [`Result`], i.e. errors are reported on individual
/// items. If there is no such error, the cumulative length of all items will be equal to
/// `count`. If `count` is 0, an empty iterator will be returned.
fn get_slices<'a>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reimplement try_access in terms of get_slices, to avoid the duplication of the iteration implementation? Or even deprecate try_access in favor of get_slices, since it seems to me to be the more powerful of the two?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it can be deprecated, I don’t have a stake in that game. :)

try_access() is slightly different in that it passes the GuestMemoryRegion plus the address in that region, not the slice, so I don’t think it can be implemented via get_slices(). However, at least some (if not most) try_access() users only really use that region to generate a slice, so those could indeed be replaced by get_slices().

So we can definitely try and see whether the try_access() calls across rust-vmm (vm-memory, vm-virtio, vhost) could all be replaced with get_slices(), and if so, indeed deprecate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a try for vm-memory, and turns out the problem is less operating on VolatileSlice instead of GuestRegion, but more than the iterator doesn't give us the offset for free, and the fact that the iterator yields Result. But nothing that isn't solved using try_fold (compile-tested only):

diff --git a/src/bitmap/mod.rs b/src/bitmap/mod.rs
index 802ebef1ca72..c92b7e618cfd 100644
--- a/src/bitmap/mod.rs
+++ b/src/bitmap/mod.rs
@@ -288,16 +288,12 @@ pub(crate) mod tests {
 
         // Finally, let's invoke the generic tests for `Bytes`.
         let check_range_closure = |m: &M, start: usize, len: usize, clean: bool| -> bool {
-            let mut check_result = true;
-            m.try_access(len, GuestAddress(start as u64), |_, size, reg_addr, reg| {
-                if !check_range(&reg.bitmap(), reg_addr.0 as usize, size, clean) {
-                    check_result = false;
-                }
-                Ok(size)
-            })
-            .unwrap();
-
-            check_result
+            m.get_slices(GuestAddress(start as u64), len)
+                .try_fold(true, |check_result, slice| {
+                    let slice = slice?;
+                    Ok(check_result && check_range(slice.bitmap(), 0, slice.len(), clean))
+                })
+                .unwrap()
         };
 
         test_bytes(
diff --git a/src/guest_memory.rs b/src/guest_memory.rs
index b9d3b62e5227..09fd38d898c2 100644
--- a/src/guest_memory.rs
+++ b/src/guest_memory.rs
@@ -354,9 +354,12 @@ pub trait GuestMemory {
 
     /// Check whether the range [base, base + len) is valid.
     fn check_range(&self, base: GuestAddress, len: usize) -> bool {
-        match self.try_access(len, base, |_, count, _, _| -> Result<usize> { Ok(count) }) {
+        match self
+            .get_slices(base, len)
+            .try_fold(0, |acc, r| r.map(|slice| acc + slice.len()))
+        {
             Ok(count) => count == len,
-            _ => false,
+            Err(_) => false,
         }
     }
 
@@ -549,23 +552,13 @@ impl<T: GuestMemory + ?Sized> Bytes<GuestAddress> for T {
     type E = Error;
 
     fn write(&self, buf: &[u8], addr: GuestAddress) -> Result<usize> {
-        self.try_access(
-            buf.len(),
-            addr,
-            |offset, count, caddr, region| -> Result<usize> {
-                region.write(&buf[offset..(offset + count)], caddr)
-            },
-        )
+        self.get_slices(addr, buf.len())
+            .try_fold(0, |acc, slice| Ok(acc + slice?.write(&buf[acc..], 0)?))
     }
 
     fn read(&self, buf: &mut [u8], addr: GuestAddress) -> Result<usize> {
-        self.try_access(
-            buf.len(),
-            addr,
-            |offset, count, caddr, region| -> Result<usize> {
-                region.read(&mut buf[offset..(offset + count)], caddr)
-            },
-        )
+        self.get_slices(addr, buf.len())
+            .try_fold(0, |acc, slice| Ok(acc + slice?.read(&mut buf[acc..], 0)?))
     }
 
     /// # Examples
@@ -629,8 +622,9 @@ impl<T: GuestMemory + ?Sized> Bytes<GuestAddress> for T {
     where
         F: ReadVolatile,
     {
-        self.try_access(count, addr, |_, len, caddr, region| -> Result<usize> {
-            region.read_volatile_from(caddr, src, len)
+        self.get_slices(addr, count).try_fold(0, |acc, r| {
+            let slice = r?;
+            Ok(acc + slice.read_volatile_from(0, src, slice.len())?)
         })
     }
 
@@ -657,10 +651,12 @@ impl<T: GuestMemory + ?Sized> Bytes<GuestAddress> for T {
     where
         F: WriteVolatile,
     {
-        self.try_access(count, addr, |_, len, caddr, region| -> Result<usize> {
+        self.get_slices(addr, count).try_fold(0, |acc, r| {
+            let slice = r?;
             // For a non-RAM region, reading could have side effects, so we
             // must use write_all().
-            region.write_all_volatile_to(caddr, dst, len).map(|()| len)
+            slice.write_all_volatile_to(0, dst, slice.len())?;
+            Ok(acc + slice.len())
         })
     }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but more than the iterator doesn't give us the offset for free

If you think it helps, we could of course add that, e.g. as part of Iterator::Item, though I’m happy with the try_fold() solution, too.

match unsafe { self.do_next() } {
Some(Ok(slice)) => Some(Ok(slice)),
other => {
// On error (or end), reset to 0 so iteration remains stopped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could implement FusedIterator, since after a single None we never return not-None ever again

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

@@ -23,6 +23,7 @@
and `GuestRegionMmap::from_range` to be separate from the error type returned by `GuestRegionCollection` functions.
Change return type of `GuestRegionMmap::new` from `Result` to `Option`.
- \[#324](https:////github.com/rust-vmm/vm-memory/pull/324)\] `GuestMemoryRegion::bitmap()` now returns a `BitmapSlice`. Accessing the full bitmap is now possible only if the type of the memory region is know, for example with `MmapRegion::bitmap()`.
- \[[#339](https://github.com/rust-vmm/vm-memory/pull/339)\] Fix `Bytes::read()` and `Bytes::write()` not to ignore `try_access()`'s `count` parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go into a Fixed section. Also, let's specify that this only applies to the blanket impl provided for T: GuestMemory

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, right!

Comment on lines +682 to +706
let expected = size_of::<O>();

let completed = self.try_access(
expected,
addr,
|offset, len, region_addr, region| -> Result<usize> {
assert_eq!(offset, 0);
if len < expected {
return Err(Error::PartialBuffer {
expected,
completed: 0,
});
}
region.store(val, region_addr, order).map(|()| expected)
},
)?;

if completed < expected {
Err(Error::PartialBuffer {
expected,
completed,
})
} else {
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one (and the one below) would be a bit simpler in terms of get_slices maybe? Shouldn't that be something like

let iter = self.get_slices(addr, size_of::<O>());
let vslice = iter.next()?;
if iter.next().is_some() {
	return Err(PartialBuffer {0})
}
vslice.store(val)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, or just self.get_slice(addr, size_of::<O>())?.store(val)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. If VolatileSlice::store() does the same as GuestMemoryRegion::store(), that will be much better indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does! In fact, GuestMemoryRegion::store() just defers to VolatileSlice::store().

Btw, this function is exactly what made me go "mh, maybe there is a use for get_slice() after all". Because atomic loads/stores that cross region boundaries cannot work, and not something that is an implementation TODO that can be solved somehow (e.g. if the guest gives us a virt queue where an atomic goes across region boundaries, then best we can do is fail the device activation).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s indeed true and a very good point; for atomic accesses, the reason why they shouldn’t go across boundaries is that they should be naturally aligned, which automatically prevents them from crossing e.g. page boundaries.

Then again, specifically for atomic accesses, I think users outside of vm-memory shouldn’t need to implement them and instead use what Bytes provides.

I still think it could be beneficial to force users to deal with an iterator even when they only expect a single slice because it might encourage them to write a comment why this is OK. Making get_slice() unsafe would achieve the same effect, but the thing is, it isn’t really unsafe, so it would technically be wrong. But maybe I’m overreaching and shouldn’t dictate how vm-memory users have to document their code…

@@ -24,6 +24,7 @@
Change return type of `GuestRegionMmap::new` from `Result` to `Option`.
- \[#324](https:////github.com/rust-vmm/vm-memory/pull/324)\] `GuestMemoryRegion::bitmap()` now returns a `BitmapSlice`. Accessing the full bitmap is now possible only if the type of the memory region is know, for example with `MmapRegion::bitmap()`.
- \[[#339](https://github.com/rust-vmm/vm-memory/pull/339)\] Fix `Bytes::read()` and `Bytes::write()` not to ignore `try_access()`'s `count` parameter
- \[[#339](https://github.com/rust-vmm/vm-memory/pull/339)\] Implement `Bytes::load()` and `Bytes::store()` with `try_access()` instead of `to_region_addr()`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also specify that it's only relevant for the blanket impl

@XanClic
Copy link
Author

XanClic commented Sep 2, 2025

I think maybe we can keep .get_slice() as a sort of utility function for when someone wants to get a contiguous slice, and where receiving something cross-region boundary would be an error condition.

Preface: For this PR, that is what is done. But, yes, I would prefer to deprecate get_slice(), and specifically for IoMemory, I prefer not to have it.

User should just not expect IoMemory::get_slice() to work. For GuestMemory, eh, hard to say[1]. I don’t want to provide a function to users that is basically just fundamentally wrong because nobody can ever promise that get_slice() will work.

Next, I can’t think of a valid reason for someone wanting to get a contiguous slice other than it being easier to handle, but that’s a problem with tooling. QEMU e.g. has good tooling around I/O vectors (a vector of buffers), so maybe we need a VolatileSliceVector, too.

Finally, what will users do when get_slice() fails because of fragmentation? The correct thing would be to fall back to get_slices(), but if they’re able to handle that, they should use it from the start. If they can’t handle it, it’ll probably become a fatal error, which is basically an implementation TODO and should be implemented.

Summarizing, I don’t find it good to facilitate users’ incomplete (and I would argue incorrect) implementations that work most of the time but aren’t guaranteed to work. I find there’s no better way to let users know instinctively that their implementation is incomplete than by forcing them to deal with an iterator and manually get exactly the first element. Rust makes it easy to ignore error handling, so I expect users to call get_slice()? and not worry about why exactly it can fail. I don’t expect anyone to put even a TODO comment on their get_slice() calls just because we put a note into the documentation recommending get_slices() instead.

In the end, innocent get_slice()? calls may then produce hard-to-reproduce crashes, and while a logged error message will probably immediately say what the problem is, this assumes that users experiencing those (rare) crashes will have the log files handy and post them.


[1] It just depends on the implementation and circumstances, and so is hard to say in general. In the vanilla vhost-user case with GuestMemoryMmap, guest-physically continuous slices are extremely likely to be continuous in a single GuestMemoryRegion. But this may already change when memory hot-plugging comes into the picture, so it’s definitely not something we can promise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants